{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ur8xi4C7S06n"
      },
      "outputs": [],
      "source": [
        "# Copyright 2021 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JAPoU8Sm5E6e"
      },
      "source": [
        "<table align=\"left\">\n",
        "\n",
        "  <td>\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/notebooks/deploy-notebook?download_url=https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/unofficial/matching_engine/matching_engine_for_indexing.ipynb\">\n",
        "      Run in Google Cloud Notebooks\n",
        "    </a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/unofficial/matching_engine/matching_engine_for_indexing.ipynb\">\n",
        "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n",
        "      View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvgnzT1CKxrO"
      },
      "source": [
        "## Overview\n",
        "\n",
        "This example demonstrates how to use the GCP ANN Service. It is a high scale, low latency solution, to find similar vectors (or more specifically \"embeddings\") for a large corpus. Moreover, it is a fully managed offering, further reducing operational overhead. It is built upon [Approximate Nearest Neighbor (ANN) technology](https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html) developed by Google Research.\n",
        "\n",
        "### Dataset\n",
        "\n",
        "The dataset used for this tutorial is the [GloVe dataset](https://nlp.stanford.edu/projects/glove/).\n",
        "\n",
        "### Objective\n",
        "\n",
        "In this notebook, you will learn how to create Approximate Nearest Neighbor (ANN) Index, query against indexes, and validate the performance of the index. \n",
        "\n",
        "The steps performed include:\n",
        "\n",
        "* Create ANN Index and Brute Force Index\n",
        "* Create an IndexEndpoint with VPC Network\n",
        "* Deploy ANN Index and Brute Force Index\n",
        "* Perform online query\n",
        "* Compute recall\n",
        "\n",
        "\n",
        "### Costs \n",
        "\n",
        "This tutorial uses billable components of Google Cloud:\n",
        "\n",
        "* Vertex AI\n",
        "* Cloud Storage\n",
        "\n",
        "Learn about [Vertex AI\n",
        "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n",
        "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n",
        "Calculator](https://cloud.google.com/products/calculator/)\n",
        "to generate a cost estimate based on your projected usage."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S5zc4kbEiYCm"
      },
      "source": [
        "## Before you begin\n",
        "\n",
        "* **Prepare a VPC network**.  To reduce any network overhead that might lead to unnecessary increase in overhead latency, it is best to call the ANN endpoints from your VPC via a direct [VPC Peering](https://cloud.google.com/vertex-ai/docs/general/vpc-peering) connection. The following section describes how to setup a VPC Peering connection if you don't have one. This is a one-time initial setup task. You can also reuse existing VPC network and skip this section.\n",
        "* **WARNING:** The match service gRPC API (to create online queries against your deployed index) has to be executed in a Google Cloud Notebook instance that is created with the following requirements:\n",
        "  * **In the same region as where your ANN service is deployed** (for example, if you set `REGION = \"us-central1\"` as same as the tutorial, the notebook instance has to be in `us-central1`).\n",
        "  * **Make sure you select the VPC network you created for ANN service** (instead of using the \"default\" one). That is, you will have to create the VPC network below and then create a new notebook instance that uses that VPC.  \n",
        "  * If you run it in the colab or a Google Cloud Notebook instance in a different VPC network or region, the gRPC API will fail to peer the network (InactiveRPCError)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lW2LneA5mmmP"
      },
      "outputs": [],
      "source": [
        "PROJECT_ID = \"<your_project_id>\"  # @param {type:\"string\"}\n",
        "NETWORK_NAME = \"ucaip-haystack-vpc-network\"  # @param {type:\"string\"}\n",
        "PEERING_RANGE_NAME = \"ucaip-haystack-range\"\n",
        "\n",
        "# Create a VPC network\n",
        "! gcloud compute networks create {NETWORK_NAME} --bgp-routing-mode=regional --subnet-mode=auto --project={PROJECT_ID}\n",
        "\n",
        "# Add necessary firewall rules\n",
        "! gcloud compute firewall-rules create {NETWORK_NAME}-allow-icmp --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow icmp\n",
        "\n",
        "! gcloud compute firewall-rules create {NETWORK_NAME}-allow-internal --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow all --source-ranges 10.128.0.0/9\n",
        "\n",
        "! gcloud compute firewall-rules create {NETWORK_NAME}-allow-rdp --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow tcp:3389\n",
        "\n",
        "! gcloud compute firewall-rules create {NETWORK_NAME}-allow-ssh --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow tcp:22\n",
        "\n",
        "# Reserve IP range\n",
        "! gcloud compute addresses create {PEERING_RANGE_NAME} --global --prefix-length=16 --network={NETWORK_NAME} --purpose=VPC_PEERING --project={PROJECT_ID} --description=\"peering range for uCAIP Haystack.\"\n",
        "\n",
        "# Set up peering with service networking\n",
        "! gcloud services vpc-peerings connect --service=servicenetworking.googleapis.com --network={NETWORK_NAME} --ranges={PEERING_RANGE_NAME} --project={PROJECT_ID}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d3uj8x73nDX_"
      },
      "source": [
        "* Authentication: `$ gcloud auth login` rerun this in Google Cloud Notebook terminal when you are logged out and need the credential again."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i7EUnXsZhAGF"
      },
      "source": [
        "### Installation\n",
        "\n",
        "Download and install the latest (preview) version of the Vertex SDK for Python."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wyy5Lbnzg5fi"
      },
      "outputs": [],
      "source": [
        "! pip install -U git+https://github.com/googleapis/python-aiplatform.git@main-test --user"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "irSMQn6gZ19l"
      },
      "source": [
        "Install the `h5py` to prepare sample dataset, and the `grpcio-tools` for querying against the index. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-h5sqwOEZ5Yq"
      },
      "outputs": [],
      "source": [
        "! pip install -U grpcio-tools --user\n",
        "! pip install -U h5py --user"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hhq5zEbGg0XX"
      },
      "source": [
        "### Restart the kernel\n",
        "\n",
        "After you install the additional packages, you need to restart the notebook kernel so it can find the packages."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EzrelQZ22IZj"
      },
      "outputs": [],
      "source": [
        "# Automatically restart kernel after installs\n",
        "import os\n",
        "\n",
        "if not os.getenv(\"IS_TESTING\"):\n",
        "    # Automatically restart kernel after installs\n",
        "    import IPython\n",
        "\n",
        "    app = IPython.Application.instance()\n",
        "    app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BF1j6f9HApxa"
      },
      "source": [
        "### Set up your Google Cloud project\n",
        "\n",
        "**The following steps are required, regardless of your notebook environment.**\n",
        "\n",
        "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager).\n",
        "\n",
        "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
        "\n",
        "1. [Enable the Vertex AI API and Compute Engine API, and Service Networking API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,servicenetworking.googleapis.com).\n",
        "\n",
        "1. Enter your project ID in the cell below. Then run the cell to make sure the\n",
        "Cloud SDK uses the right project for all the commands in this notebook.\n",
        "\n",
        "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WReHDGG5g0XY"
      },
      "source": [
        "#### Set your project ID\n",
        "\n",
        "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oM1iC_MfAts1"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "PROJECT_ID = \"\"\n",
        "\n",
        "# Get your Google Cloud project ID from gcloud\n",
        "if not os.getenv(\"IS_TESTING\"):\n",
        "    shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null\n",
        "    PROJECT_ID = shell_output[0]\n",
        "    print(\"Project ID: \", PROJECT_ID)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qJYoRfYng0XZ"
      },
      "source": [
        "Otherwise, set your project ID here."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "riG_qUokg0XZ"
      },
      "outputs": [],
      "source": [
        "if PROJECT_ID == \"\" or PROJECT_ID is None:\n",
        "    PROJECT_ID = \"<your_project_id>\"  # @param {type:\"string\"}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zgPO1eR3CYjk"
      },
      "source": [
        "### Create a Cloud Storage bucket\n",
        "\n",
        "**The following steps are required, regardless of your notebook environment.**\n",
        "\n",
        "Set the name of your Cloud Storage bucket below. It must be unique across all\n",
        "Cloud Storage buckets.\n",
        "\n",
        "You may also change the `REGION` variable, which is used for operations\n",
        "throughout the rest of this notebook. Make sure to [choose a region where Vertex AI services are\n",
        "available](https://cloud.google.com/vertex-ai/docs/general/locations#available_regions). You may\n",
        "not use a Multi-Regional Storage bucket for training with Vertex AI."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MzGDU7TWdts_"
      },
      "outputs": [],
      "source": [
        "BUCKET_NAME = \"gs://[your-bucket-name]\"  # @param {type:\"string\"}\n",
        "REGION = \"us-central1\"  # @param {type:\"string\"}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cf221059d072"
      },
      "outputs": [],
      "source": [
        "from datetime import datetime\n",
        "\n",
        "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
        "\n",
        "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"gs://[your-bucket-name]\":\n",
        "    BUCKET_NAME = \"gs://\" + PROJECT_ID + \"aip-\" + TIMESTAMP"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-EcIXiGsCePi"
      },
      "source": [
        "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NIq7R4HZCfIc"
      },
      "outputs": [],
      "source": [
        "! gsutil mb -l $REGION $BUCKET_NAME"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ucvCsknMCims"
      },
      "source": [
        "Finally, validate access to your Cloud Storage bucket by examining its contents:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vhOb7YnwClBb"
      },
      "outputs": [],
      "source": [
        "! gsutil ls -al $BUCKET_NAME"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XoEqT2Y4DJmf"
      },
      "source": [
        "### Import libraries and define constants"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y9Uo3tifg1kx"
      },
      "source": [
        "Import the Vertex AI (unified) client library into your Python environment. \n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "f2d05ab4126a"
      },
      "outputs": [],
      "source": [
        "import time\n",
        "\n",
        "import grpc\n",
        "import h5py\n",
        "from google.cloud import aiplatform_v1beta1\n",
        "from google.protobuf import struct_pb2"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pRUOFELefqf1"
      },
      "outputs": [],
      "source": [
        "REGION = \"us-central1\"\n",
        "ENDPOINT = \"{}-aiplatform.googleapis.com\".format(REGION)\n",
        "NETWORK_NAME = \"ucaip-haystack-vpc-network\"  # @param {type:\"string\"}\n",
        "\n",
        "\n",
        "AUTH_TOKEN = !gcloud auth print-access-token\n",
        "PROJECT_NUMBER = !gcloud projects list --filter=\"PROJECT_ID:'{PROJECT_ID}'\" --format='value(PROJECT_NUMBER)'\n",
        "PROJECT_NUMBER = PROJECT_NUMBER[0]\n",
        "\n",
        "PARENT = \"projects/{}/locations/{}\".format(PROJECT_ID, REGION)\n",
        "\n",
        "print(\"ENDPOINT: {}\".format(ENDPOINT))\n",
        "print(\"PROJECT_ID: {}\".format(PROJECT_ID))\n",
        "print(\"REGION: {}\".format(REGION))\n",
        "\n",
        "!gcloud config set project {PROJECT_ID}\n",
        "!gcloud config set ai_platform/region {REGION}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lR6Wwv-hCCN-"
      },
      "source": [
        "## Prepare the Data\n",
        "\n",
        "The GloVe dataset consists of a set of pre-trained embeddings. The embeddings are split into a \"train\" split, and a \"test\" split.\n",
        "We will create a vector search index from the \"train\" split, and use the embedding vectors in the \"test\" split as query vectors to test the vector search index.\n",
        "\n",
        "NOTE: While the data split uses the term \"train\", these are pre-trained embeddings and thus are ready to be indexed for search. The terms \"train\" and \"test\" split are used just to be consistent with usual machine learning terminology.\n",
        "\n",
        "Download the GloVe dataset.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9wzS85TeB9dG"
      },
      "outputs": [],
      "source": [
        "! gsutil cp gs://cloud-samples-data/ai-platform-unified/matching_engine/glove-100-angular.hdf5 ."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4fAO9CMoCNtq"
      },
      "source": [
        "Read the data into memory.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lZ3JQTS6CN-3"
      },
      "outputs": [],
      "source": [
        "# The number of nearest neighbors to be retrieved from database for each query.\n",
        "k = 10\n",
        "\n",
        "h5 = h5py.File(\"glove-100-angular.hdf5\", \"r\")\n",
        "train = h5[\"train\"]\n",
        "test = h5[\"test\"]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pE6bBBo7GjJK"
      },
      "outputs": [],
      "source": [
        "train[0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aQIQSyF9GtSv"
      },
      "source": [
        "Save the train split in JSONL format.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "18wCiTwfG40P"
      },
      "outputs": [],
      "source": [
        "with open(\"glove100.json\", \"w\") as f:\n",
        "    for i in range(len(train)):\n",
        "        f.write('{\"id\":\"' + str(i) + '\",')\n",
        "        f.write('\"embedding\":[' + \",\".join(str(x) for x in train[i]) + \"]}\")\n",
        "        f.write(\"\\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QuVl8DrWG8NS"
      },
      "source": [
        "Upload the training data to GCS."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Gk6YmPMoG8aX"
      },
      "outputs": [],
      "source": [
        "# NOTE: Everything in this GCS DIR will be DELETED before uploading the data.\n",
        "\n",
        "! gsutil rm -rf {BUCKET_NAME}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3PgsA_vbI8Vg"
      },
      "outputs": [],
      "source": [
        "! gsutil cp glove100.json {BUCKET_NAME}/glove100.json"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3RX6g7FaJFes"
      },
      "outputs": [],
      "source": [
        "! gsutil ls {BUCKET_NAME}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mglUPwHpJH98"
      },
      "source": [
        "## Create Indexes\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qhIBCQ7dDSbW"
      },
      "source": [
        "### Create ANN Index (for Production Usage)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "index_client = aiplatform_v1beta1.IndexServiceClient(\n",
        "    client_options=dict(api_endpoint=ENDPOINT)\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qiIg9b5zJLi1"
      },
      "outputs": [],
      "source": [
        "DIMENSIONS = 100\n",
        "DISPLAY_NAME = \"glove_100_1\"\n",
        "DISPLAY_NAME_BRUTE_FORCE = DISPLAY_NAME + \"_brute_force\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "svLYiDf0OD2G"
      },
      "source": [
        "Create the ANN index configuration:\n",
        "\n",
        "Please read the documentation to understand the various configuration parameters that can be used to tune the index\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Tfa8IoNrOCJh"
      },
      "outputs": [],
      "source": [
        "treeAhConfig = struct_pb2.Struct(\n",
        "    fields={\n",
        "        \"leafNodeEmbeddingCount\": struct_pb2.Value(number_value=500),\n",
        "        \"leafNodesToSearchPercent\": struct_pb2.Value(number_value=7),\n",
        "    }\n",
        ")\n",
        "\n",
        "algorithmConfig = struct_pb2.Struct(\n",
        "    fields={\"treeAhConfig\": struct_pb2.Value(struct_value=treeAhConfig)}\n",
        ")\n",
        "\n",
        "config = struct_pb2.Struct(\n",
        "    fields={\n",
        "        \"dimensions\": struct_pb2.Value(number_value=DIMENSIONS),\n",
        "        \"approximateNeighborsCount\": struct_pb2.Value(number_value=150),\n",
        "        \"distanceMeasureType\": struct_pb2.Value(string_value=\"DOT_PRODUCT_DISTANCE\"),\n",
        "        \"algorithmConfig\": struct_pb2.Value(struct_value=algorithmConfig),\n",
        "    }\n",
        ")\n",
        "\n",
        "metadata = struct_pb2.Struct(\n",
        "    fields={\n",
        "        \"config\": struct_pb2.Value(struct_value=config),\n",
        "        \"contentsDeltaUri\": struct_pb2.Value(string_value=BUCKET_NAME),\n",
        "    }\n",
        ")\n",
        "\n",
        "ann_index = {\n",
        "    \"display_name\": DISPLAY_NAME,\n",
        "    \"description\": \"Glove 100 ANN index\",\n",
        "    \"metadata\": struct_pb2.Value(struct_value=metadata),\n",
        "}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xzY7TpUSJcTV"
      },
      "outputs": [],
      "source": [
        "ann_index = index_client.create_index(parent=PARENT, index=ann_index)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oLBD2xXG_tv7"
      },
      "outputs": [],
      "source": [
        "# Poll the operation until it's done successfullly.\n",
        "# This will take ~45 min.\n",
        "\n",
        "while True:\n",
        "    if ann_index.done():\n",
        "        break\n",
        "    print(\"Poll the operation to create index...\")\n",
        "    time.sleep(60)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "17jrQi501QyX"
      },
      "outputs": [],
      "source": [
        "INDEX_RESOURCE_NAME = ann_index.result().name\n",
        "INDEX_RESOURCE_NAME"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kSsqZuyoA1SG"
      },
      "source": [
        "### Create Brute Force Index (for Ground Truth)\n",
        "\n",
        "The brute force index uses a naive brute force method to find the nearest neighbors. This method is not fast or efficient. Hence brute force indices are not recommended for production usage. They are to be used to find the \"ground truth\" set of neighbors, so that the \"ground truth\" set can be used to measure recall of the indices being tuned for production usage. To ensure an apples to apples comparison, the `distanceMeasureType` and `featureNormType`, `dimensions` of the brute force index should match those of the production indices being tuned.\n",
        "\n",
        "Create the brute force index configuration:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5ExrilcZA87V"
      },
      "outputs": [],
      "source": [
        "from google.protobuf import *\n",
        "\n",
        "algorithmConfig = struct_pb2.Struct(\n",
        "    fields={\"bruteForceConfig\": struct_pb2.Value(struct_value=struct_pb2.Struct())}\n",
        ")\n",
        "\n",
        "config = struct_pb2.Struct(\n",
        "    fields={\n",
        "        \"dimensions\": struct_pb2.Value(number_value=DIMENSIONS),\n",
        "        \"approximateNeighborsCount\": struct_pb2.Value(number_value=150),\n",
        "        \"distanceMeasureType\": struct_pb2.Value(string_value=\"DOT_PRODUCT_DISTANCE\"),\n",
        "        \"algorithmConfig\": struct_pb2.Value(struct_value=algorithmConfig),\n",
        "    }\n",
        ")\n",
        "\n",
        "metadata = struct_pb2.Struct(\n",
        "    fields={\n",
        "        \"config\": struct_pb2.Value(struct_value=config),\n",
        "        \"contentsDeltaUri\": struct_pb2.Value(string_value=BUCKET_NAME),\n",
        "    }\n",
        ")\n",
        "\n",
        "brute_force_index = {\n",
        "    \"display_name\": DISPLAY_NAME_BRUTE_FORCE,\n",
        "    \"description\": \"Glove 100 index (brute force)\",\n",
        "    \"metadata\": struct_pb2.Value(struct_value=metadata),\n",
        "}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DXnBLqjXBsv8"
      },
      "outputs": [],
      "source": [
        "brute_force_index = index_client.create_index(parent=PARENT, index=brute_force_index)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AtwELX4Peq2n"
      },
      "outputs": [],
      "source": [
        "# Poll the operation until it's done successfullly.\n",
        "# This will take ~45 min.\n",
        "\n",
        "while True:\n",
        "    if brute_force_index.done():\n",
        "        break\n",
        "    print(\"Poll the operation to create index...\")\n",
        "    time.sleep(60)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "_oD5SieYJbbW"
      },
      "outputs": [],
      "source": [
        "INDEX_BRUTE_FORCE_RESOURCE_NAME = brute_force_index.result().name\n",
        "INDEX_BRUTE_FORCE_RESOURCE_NAME"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mglUPwHpJH98"
      },
      "source": [
        "## Update Indexes\n",
        "\n",
        "Create incremental data file.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "with open(\"glove100_incremental.json\", \"w\") as f:\n",
        "    f.write(\n",
        "        '{\"id\":\"0\",\"embedding\":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}\\n'\n",
        "    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "svLYiDf0OD2G"
      },
      "source": [
        "Copy the incremental data file to a new subdirectory.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "! gsutil cp glove100_incremental.json {BUCKET_NAME}/incremental/glove100.json"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "svLYiDf0OD2G"
      },
      "source": [
        "Create update index request\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "metadata = struct_pb2.Struct(\n",
        "    fields={\n",
        "        \"contentsDeltaUri\": struct_pb2.Value(string_value=BUCKET_NAME + \"/incremental\"),\n",
        "    }\n",
        ")\n",
        "\n",
        "ann_index = {\n",
        "    \"name\": INDEX_RESOURCE_NAME,\n",
        "    \"display_name\": DISPLAY_NAME,\n",
        "    \"description\": \"Glove 100 ANN index\",\n",
        "    \"metadata\": struct_pb2.Value(struct_value=metadata),\n",
        "}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "ann_index = index_client.update_index(index=ann_index)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "# Poll the operation until it's done successfullly.\n",
        "# This will take ~45 min.\n",
        "\n",
        "while True:\n",
        "    if ann_index.done():\n",
        "        break\n",
        "    print(\"Poll the operation to update index...\")\n",
        "    time.sleep(60)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DDAvm_mj_BVs"
      },
      "outputs": [],
      "source": [
        "INDEX_RESOURCE_NAME = ann_index.result().name\n",
        "INDEX_RESOURCE_NAME"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qV2xjAnDDObD"
      },
      "source": [
        "## Create an IndexEndpoint with VPC Network"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2AWeQ6e04m36"
      },
      "outputs": [],
      "source": [
        "index_endpoint_client = aiplatform_v1beta1.IndexEndpointServiceClient(\n",
        "    client_options=dict(api_endpoint=ENDPOINT)\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BpZQoJyxDlbO"
      },
      "outputs": [],
      "source": [
        "VPC_NETWORK_NAME = \"projects/{}/global/networks/{}\".format(PROJECT_NUMBER, NETWORK_NAME)\n",
        "VPC_NETWORK_NAME"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mColTdkIDoVZ"
      },
      "outputs": [],
      "source": [
        "index_endpoint = {\n",
        "    \"display_name\": \"index_endpoint_for_demo\",\n",
        "    \"network\": VPC_NETWORK_NAME,\n",
        "}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QuARXzJVGyQX"
      },
      "outputs": [],
      "source": [
        "r = index_endpoint_client.create_index_endpoint(\n",
        "    parent=PARENT, index_endpoint=index_endpoint\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YRL1AYF_HpZR"
      },
      "outputs": [],
      "source": [
        "r.result()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PJ3bcZqi-cfM"
      },
      "outputs": [],
      "source": [
        "INDEX_ENDPOINT_NAME = r.result().name\n",
        "INDEX_ENDPOINT_NAME"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "np2cgVuuIe9k"
      },
      "source": [
        "## Deploy Indexes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8Ew1UgcIIiJG"
      },
      "source": [
        "### Deploy ANN Index"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nLOYTGygIlMK"
      },
      "outputs": [],
      "source": [
        "DEPLOYED_INDEX_ID = \"ann_glove_deployed\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "M-W5LYQrKTzi"
      },
      "outputs": [],
      "source": [
        "deploy_ann_index = {\n",
        "    \"id\": DEPLOYED_INDEX_ID,\n",
        "    \"display_name\": DEPLOYED_INDEX_ID,\n",
        "    \"index\": INDEX_RESOURCE_NAME,\n",
        "}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "_uK4WOgqN1NG"
      },
      "outputs": [],
      "source": [
        "r = index_endpoint_client.deploy_index(\n",
        "    index_endpoint=INDEX_ENDPOINT_NAME, deployed_index=deploy_ann_index\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2Lt0jvsSeekz"
      },
      "outputs": [],
      "source": [
        "# Poll the operation until it's done successfullly.\n",
        "\n",
        "while True:\n",
        "    if r.done():\n",
        "        break\n",
        "    print(\"Poll the operation to deploy index...\")\n",
        "    time.sleep(60)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8ajRpqe2J9aS"
      },
      "outputs": [],
      "source": [
        "r.result()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RNZnXmO5AhDO"
      },
      "source": [
        "### Deploy Brute Force Index"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3p9e4828AkSv"
      },
      "outputs": [],
      "source": [
        "DEPLOYED_BRUTE_FORCE_INDEX_ID = \"glove_brute_force_deployed\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6PgQKgHQAq3p"
      },
      "outputs": [],
      "source": [
        "deploy_brute_force_index = {\n",
        "    \"id\": DEPLOYED_BRUTE_FORCE_INDEX_ID,\n",
        "    \"display_name\": DEPLOYED_BRUTE_FORCE_INDEX_ID,\n",
        "    \"index\": INDEX_BRUTE_FORCE_RESOURCE_NAME,\n",
        "}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-2kgd01SA4rk"
      },
      "outputs": [],
      "source": [
        "r = index_endpoint_client.deploy_index(\n",
        "    index_endpoint=INDEX_ENDPOINT_NAME, deployed_index=deploy_brute_force_index\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R6nZiQP-c2nu"
      },
      "outputs": [],
      "source": [
        "# Poll the operation until it's done successfullly.\n",
        "\n",
        "while True:\n",
        "    if r.done():\n",
        "        break\n",
        "    print(\"Poll the operation to deploy index...\")\n",
        "    time.sleep(60)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2v7J36ShA9Sw"
      },
      "outputs": [],
      "source": [
        "r.result()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6LCGvBNvBd8D"
      },
      "source": [
        "## Create Online Queries\n",
        "\n",
        "After you built your indexes, you may query against the deployed index through the online querying gRPC API (Match service) within the virtual machine instances from the same region (for example 'us-central1' in this tutorial).  \n",
        "\n",
        "The way a client uses this gRPC API is by folowing steps:\n",
        "\n",
        "* Write `match_service.proto` locally\n",
        "* Clone the repository that contains the dependencies of match_service.proto in the Terminal:\n",
        "\n",
        "`$ mkdir third_party && cd third_party`\n",
        "\n",
        "`$ git clone https://github.com/googleapis/googleapis.git`\n",
        "\n",
        "* Compile the protocal buffer (see below)\n",
        "* Obtain the index endpoint\n",
        "* Use a code-generated stub to make the call, passing the parameter values"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "code",
        "id": "rUUW3ViQE88D"
      },
      "outputs": [],
      "source": [
        "%%writefile match_service.proto\n",
        "\n",
        "syntax = \"proto3\";\n",
        "\n",
        "package google.cloud.aiplatform.container.v1beta1;\n",
        "\n",
        "import \"google/rpc/status.proto\";\n",
        "\n",
        "// MatchService is a Google managed service for efficient vector similarity\n",
        "// search at scale.\n",
        "service MatchService {\n",
        "  // Returns the nearest neighbors for the query. If it is a sharded\n",
        "  // deployment, calls the other shards and aggregates the responses.\n",
        "  rpc Match(MatchRequest) returns (MatchResponse) {}\n",
        "\n",
        "  // Returns the nearest neighbors for batch queries. If it is a sharded\n",
        "  // deployment, calls the other shards and aggregates the responses.\n",
        "  rpc BatchMatch(BatchMatchRequest) returns (BatchMatchResponse) {}\n",
        "}\n",
        "\n",
        "// Parameters for a match query.\n",
        "message MatchRequest {\n",
        "  // The ID of the DeploydIndex that will serve the request.\n",
        "  // This MatchRequest is sent to a specific IndexEndpoint of the Control API,\n",
        "  // as per the IndexEndpoint.network. That IndexEndpoint also has\n",
        "  // IndexEndpoint.deployed_indexes, and each such index has an\n",
        "  // DeployedIndex.id field.\n",
        "  // The value of the field below must equal one of the DeployedIndex.id\n",
        "  // fields of the IndexEndpoint that is being called for this request.\n",
        "  string deployed_index_id = 1;\n",
        "\n",
        "  // The embedding values.\n",
        "  repeated float float_val = 2;\n",
        "\n",
        "  // The number of nearest neighbors to be retrieved from database for\n",
        "  // each query. If not set, will use the default from\n",
        "  // the service configuration.\n",
        "  int32 num_neighbors = 3;\n",
        "\n",
        "  // The list of restricts.\n",
        "  repeated Namespace restricts = 4;\n",
        "\n",
        "  // Crowding is a constraint on a neighbor list produced by nearest neighbor\n",
        "  // search requiring that no more than some value k' of the k neighbors\n",
        "  // returned have the same value of crowding_attribute.\n",
        "  // It's used for improving result diversity.\n",
        "  // This field is the maximum number of matches with the same crowding tag.\n",
        "  int32 per_crowding_attribute_num_neighbors = 5;\n",
        "\n",
        "  // The number of neighbors to find via approximate search before\n",
        "  // exact reordering is performed. If not set, the default value from scam\n",
        "  // config is used; if set, this value must be > 0.\n",
        "  int32 approx_num_neighbors = 6;\n",
        "\n",
        "  // The fraction of the number of leaves to search, set at query time allows\n",
        "  // user to tune search performance. This value increase result in both search\n",
        "  // accuracy and latency increase. The value should be between 0.0 and 1.0. If\n",
        "  // not set or set to 0.0, query uses the default value specified in\n",
        "  // NearestNeighborSearchConfig.TreeAHConfig.leaf_nodes_to_search_percent.\n",
        "  int32 leaf_nodes_to_search_percent_override = 7;\n",
        "}\n",
        "\n",
        "// Response of a match query.\n",
        "message MatchResponse {\n",
        "  message Neighbor {\n",
        "    // The ids of the matches.\n",
        "    string id = 1;\n",
        "\n",
        "    // The distances of the matches.\n",
        "    double distance = 2;\n",
        "  }\n",
        "  // All its neighbors.\n",
        "  repeated Neighbor neighbor = 1;\n",
        "}\n",
        "\n",
        "// Parameters for a batch match query.\n",
        "message BatchMatchRequest {\n",
        "  // Batched requests against one index.\n",
        "  message BatchMatchRequestPerIndex {\n",
        "    // The ID of the DeploydIndex that will serve the request.\n",
        "    string deployed_index_id = 1;\n",
        "\n",
        "    // The requests against the index identified by the above deployed_index_id.\n",
        "    repeated MatchRequest requests = 2;\n",
        "\n",
        "    // Selects the optimal batch size to use for low-level batching. Queries\n",
        "    // within each low level batch are executed sequentially while low level\n",
        "    // batches are executed in parallel.\n",
        "    // This field is optional, defaults to 0 if not set. A non-positive number\n",
        "    // disables low level batching, i.e. all queries are executed sequentially.\n",
        "    int32 low_level_batch_size = 3;\n",
        "  }\n",
        "\n",
        "  // The batch requests grouped by indexes.\n",
        "  repeated BatchMatchRequestPerIndex requests = 1;\n",
        "}\n",
        "\n",
        "// Response of a batch match query.\n",
        "message BatchMatchResponse {\n",
        "  // Batched responses for one index.\n",
        "  message BatchMatchResponsePerIndex {\n",
        "    // The ID of the DeployedIndex that produced the responses.\n",
        "    string deployed_index_id = 1;\n",
        "\n",
        "    // The match responses produced by the index identified by the above\n",
        "    // deployed_index_id. This field is set only when the query against that\n",
        "    // index succeed.\n",
        "    repeated MatchResponse responses = 2;\n",
        "\n",
        "    // The status of response for the batch query identified by the above\n",
        "    // deployed_index_id.\n",
        "    google.rpc.Status status = 3;\n",
        "  }\n",
        "\n",
        "  // The batched responses grouped by indexes.\n",
        "  repeated BatchMatchResponsePerIndex responses = 1;\n",
        "}\n",
        "\n",
        "// Namespace specifies the rules for determining the datapoints that are\n",
        "// eligible for each matching query, overall query is an AND across namespaces.\n",
        "message Namespace {\n",
        "  // The string name of the namespace that this proto is specifying,\n",
        "  // such as \"color\", \"shape\", \"geo\", or \"tags\".\n",
        "  string name = 1;\n",
        "\n",
        "  // The allowed tokens in the namespace.\n",
        "  repeated string allow_tokens = 2;\n",
        "\n",
        "  // The denied tokens in the namespace.\n",
        "  // The denied tokens have exactly the same format as the token fields, but\n",
        "  // represents a negation. When a token is denied, then matches will be\n",
        "  // excluded whenever the other datapoint has that token.\n",
        "  //\n",
        "  // For example, if a query specifies {color: red, blue, !purple}, then that\n",
        "  // query will match datapoints that are red or blue, but if those points are\n",
        "  // also purple, then they will be excluded even if they are red/blue.\n",
        "  repeated string deny_tokens = 3;\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dfh48KLTJkaF"
      },
      "source": [
        "Compile the protocol buffer, and then `match_service_pb2.py` and `match_service_pb2_grpc.py` are generated."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EehHF_AeGmQT"
      },
      "outputs": [],
      "source": [
        "! python -m grpc_tools.protoc -I=. --proto_path=third_party/googleapis --python_out=. --grpc_python_out=. match_service.proto"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8wXTSgz1Bl0x"
      },
      "source": [
        "Obtain the Private Endpoint: "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zwA042IZJv1h"
      },
      "outputs": [],
      "source": [
        "DEPLOYED_INDEX_SERVER_IP = (\n",
        "    list(index_endpoint_client.list_index_endpoints(parent=PARENT))[0]\n",
        "    .deployed_indexes[0]\n",
        "    .private_endpoints.match_grpc_address\n",
        ")\n",
        "DEPLOYED_INDEX_SERVER_IP"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IcXa9lSuB9AT"
      },
      "source": [
        "Test your query:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zhgTqsI1spsH"
      },
      "outputs": [],
      "source": [
        "import match_service_pb2\n",
        "import match_service_pb2_grpc\n",
        "\n",
        "channel = grpc.insecure_channel(\"{}:10000\".format(DEPLOYED_INDEX_SERVER_IP))\n",
        "stub = match_service_pb2_grpc.MatchServiceStub(channel)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "A3KYVw5HB-4v"
      },
      "outputs": [],
      "source": [
        "# Test query\n",
        "query = [\n",
        "    -0.11333,\n",
        "    0.48402,\n",
        "    0.090771,\n",
        "    -0.22439,\n",
        "    0.034206,\n",
        "    -0.55831,\n",
        "    0.041849,\n",
        "    -0.53573,\n",
        "    0.18809,\n",
        "    -0.58722,\n",
        "    0.015313,\n",
        "    -0.014555,\n",
        "    0.80842,\n",
        "    -0.038519,\n",
        "    0.75348,\n",
        "    0.70502,\n",
        "    -0.17863,\n",
        "    0.3222,\n",
        "    0.67575,\n",
        "    0.67198,\n",
        "    0.26044,\n",
        "    0.4187,\n",
        "    -0.34122,\n",
        "    0.2286,\n",
        "    -0.53529,\n",
        "    1.2582,\n",
        "    -0.091543,\n",
        "    0.19716,\n",
        "    -0.037454,\n",
        "    -0.3336,\n",
        "    0.31399,\n",
        "    0.36488,\n",
        "    0.71263,\n",
        "    0.1307,\n",
        "    -0.24654,\n",
        "    -0.52445,\n",
        "    -0.036091,\n",
        "    0.55068,\n",
        "    0.10017,\n",
        "    0.48095,\n",
        "    0.71104,\n",
        "    -0.053462,\n",
        "    0.22325,\n",
        "    0.30917,\n",
        "    -0.39926,\n",
        "    0.036634,\n",
        "    -0.35431,\n",
        "    -0.42795,\n",
        "    0.46444,\n",
        "    0.25586,\n",
        "    0.68257,\n",
        "    -0.20821,\n",
        "    0.38433,\n",
        "    0.055773,\n",
        "    -0.2539,\n",
        "    -0.20804,\n",
        "    0.52522,\n",
        "    -0.11399,\n",
        "    -0.3253,\n",
        "    -0.44104,\n",
        "    0.17528,\n",
        "    0.62255,\n",
        "    0.50237,\n",
        "    -0.7607,\n",
        "    -0.071786,\n",
        "    0.0080131,\n",
        "    -0.13286,\n",
        "    0.50097,\n",
        "    0.18824,\n",
        "    -0.54722,\n",
        "    -0.42664,\n",
        "    0.4292,\n",
        "    0.14877,\n",
        "    -0.0072514,\n",
        "    -0.16484,\n",
        "    -0.059798,\n",
        "    0.9895,\n",
        "    -0.61738,\n",
        "    0.054169,\n",
        "    0.48424,\n",
        "    -0.35084,\n",
        "    -0.27053,\n",
        "    0.37829,\n",
        "    0.11503,\n",
        "    -0.39613,\n",
        "    0.24266,\n",
        "    0.39147,\n",
        "    -0.075256,\n",
        "    0.65093,\n",
        "    -0.20822,\n",
        "    -0.17456,\n",
        "    0.53571,\n",
        "    -0.16537,\n",
        "    0.13582,\n",
        "    -0.56016,\n",
        "    0.016964,\n",
        "    0.1277,\n",
        "    0.94071,\n",
        "    -0.22608,\n",
        "    -0.021106,\n",
        "]\n",
        "\n",
        "request = match_service_pb2.MatchRequest()\n",
        "request.deployed_index_id = DEPLOYED_INDEX_ID\n",
        "for val in query:\n",
        "    request.float_val.append(val)\n",
        "\n",
        "response = stub.Match(request)\n",
        "response"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_mNwdU9_B_Ez"
      },
      "source": [
        "### Batch Query\n",
        "\n",
        "You can run multiple queries in a single RPC call using the BatchMatch API:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "L55vcqox5cQz"
      },
      "outputs": [],
      "source": [
        "def get_request(embedding, deployed_index_id):\n",
        "    request = match_service_pb2.MatchRequest(num_neighbors=k)\n",
        "    request.deployed_index_id = deployed_index_id\n",
        "    for val in embedding:\n",
        "        request.float_val.append(val)\n",
        "    return request"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "A3KYVw5HB-4v"
      },
      "outputs": [],
      "source": [
        "# Test query\n",
        "queries = [\n",
        "    [\n",
        "        -0.11333,\n",
        "        0.48402,\n",
        "        0.090771,\n",
        "        -0.22439,\n",
        "        0.034206,\n",
        "        -0.55831,\n",
        "        0.041849,\n",
        "        -0.53573,\n",
        "        0.18809,\n",
        "        -0.58722,\n",
        "        0.015313,\n",
        "        -0.014555,\n",
        "        0.80842,\n",
        "        -0.038519,\n",
        "        0.75348,\n",
        "        0.70502,\n",
        "        -0.17863,\n",
        "        0.3222,\n",
        "        0.67575,\n",
        "        0.67198,\n",
        "        0.26044,\n",
        "        0.4187,\n",
        "        -0.34122,\n",
        "        0.2286,\n",
        "        -0.53529,\n",
        "        1.2582,\n",
        "        -0.091543,\n",
        "        0.19716,\n",
        "        -0.037454,\n",
        "        -0.3336,\n",
        "        0.31399,\n",
        "        0.36488,\n",
        "        0.71263,\n",
        "        0.1307,\n",
        "        -0.24654,\n",
        "        -0.52445,\n",
        "        -0.036091,\n",
        "        0.55068,\n",
        "        0.10017,\n",
        "        0.48095,\n",
        "        0.71104,\n",
        "        -0.053462,\n",
        "        0.22325,\n",
        "        0.30917,\n",
        "        -0.39926,\n",
        "        0.036634,\n",
        "        -0.35431,\n",
        "        -0.42795,\n",
        "        0.46444,\n",
        "        0.25586,\n",
        "        0.68257,\n",
        "        -0.20821,\n",
        "        0.38433,\n",
        "        0.055773,\n",
        "        -0.2539,\n",
        "        -0.20804,\n",
        "        0.52522,\n",
        "        -0.11399,\n",
        "        -0.3253,\n",
        "        -0.44104,\n",
        "        0.17528,\n",
        "        0.62255,\n",
        "        0.50237,\n",
        "        -0.7607,\n",
        "        -0.071786,\n",
        "        0.0080131,\n",
        "        -0.13286,\n",
        "        0.50097,\n",
        "        0.18824,\n",
        "        -0.54722,\n",
        "        -0.42664,\n",
        "        0.4292,\n",
        "        0.14877,\n",
        "        -0.0072514,\n",
        "        -0.16484,\n",
        "        -0.059798,\n",
        "        0.9895,\n",
        "        -0.61738,\n",
        "        0.054169,\n",
        "        0.48424,\n",
        "        -0.35084,\n",
        "        -0.27053,\n",
        "        0.37829,\n",
        "        0.11503,\n",
        "        -0.39613,\n",
        "        0.24266,\n",
        "        0.39147,\n",
        "        -0.075256,\n",
        "        0.65093,\n",
        "        -0.20822,\n",
        "        -0.17456,\n",
        "        0.53571,\n",
        "        -0.16537,\n",
        "        0.13582,\n",
        "        -0.56016,\n",
        "        0.016964,\n",
        "        0.1277,\n",
        "        0.94071,\n",
        "        -0.22608,\n",
        "        -0.021106,\n",
        "    ],\n",
        "    [\n",
        "        -0.99544,\n",
        "        -2.3651,\n",
        "        -0.24332,\n",
        "        -1.0321,\n",
        "        0.42052,\n",
        "        -1.1817,\n",
        "        -0.16451,\n",
        "        -1.683,\n",
        "        0.49673,\n",
        "        -0.27258,\n",
        "        -0.025397,\n",
        "        0.34188,\n",
        "        1.5523,\n",
        "        1.3532,\n",
        "        0.33297,\n",
        "        -0.0056677,\n",
        "        -0.76525,\n",
        "        0.49587,\n",
        "        1.2211,\n",
        "        0.83394,\n",
        "        -0.20031,\n",
        "        -0.59657,\n",
        "        0.38485,\n",
        "        -0.23487,\n",
        "        -1.0725,\n",
        "        0.95856,\n",
        "        0.16161,\n",
        "        -1.2496,\n",
        "        1.6751,\n",
        "        0.73899,\n",
        "        0.051347,\n",
        "        -0.42702,\n",
        "        0.16257,\n",
        "        -0.16772,\n",
        "        0.40146,\n",
        "        0.29837,\n",
        "        0.96204,\n",
        "        -0.36232,\n",
        "        -0.47848,\n",
        "        0.78278,\n",
        "        0.14834,\n",
        "        1.3407,\n",
        "        0.47834,\n",
        "        -0.39083,\n",
        "        -1.037,\n",
        "        -0.24643,\n",
        "        -0.75841,\n",
        "        0.7669,\n",
        "        -0.37363,\n",
        "        0.52741,\n",
        "        0.018563,\n",
        "        -0.51301,\n",
        "        0.97674,\n",
        "        0.55232,\n",
        "        1.1584,\n",
        "        0.73715,\n",
        "        1.3055,\n",
        "        -0.44743,\n",
        "        -0.15961,\n",
        "        0.85006,\n",
        "        -0.34092,\n",
        "        -0.67667,\n",
        "        0.2317,\n",
        "        1.5582,\n",
        "        1.2308,\n",
        "        -0.62213,\n",
        "        -0.032801,\n",
        "        0.1206,\n",
        "        -0.25899,\n",
        "        -0.02756,\n",
        "        -0.52814,\n",
        "        -0.93523,\n",
        "        0.58434,\n",
        "        -0.24799,\n",
        "        0.37692,\n",
        "        0.86527,\n",
        "        0.069626,\n",
        "        1.3096,\n",
        "        0.29975,\n",
        "        -1.3651,\n",
        "        -0.32048,\n",
        "        -0.13741,\n",
        "        0.33329,\n",
        "        -1.9113,\n",
        "        -0.60222,\n",
        "        -0.23921,\n",
        "        0.12664,\n",
        "        -0.47961,\n",
        "        -0.89531,\n",
        "        0.62054,\n",
        "        0.40869,\n",
        "        -0.08503,\n",
        "        0.6413,\n",
        "        -0.84044,\n",
        "        -0.74325,\n",
        "        -0.19426,\n",
        "        0.098722,\n",
        "        0.32648,\n",
        "        -0.67621,\n",
        "        -0.62692,\n",
        "    ],\n",
        "]\n",
        "\n",
        "batch_request = match_service_pb2.BatchMatchRequest()\n",
        "batch_request_ann = match_service_pb2.BatchMatchRequest.BatchMatchRequestPerIndex()\n",
        "batch_request_brute_force = (\n",
        "    match_service_pb2.BatchMatchRequest.BatchMatchRequestPerIndex()\n",
        ")\n",
        "batch_request_ann.deployed_index_id = DEPLOYED_INDEX_ID\n",
        "batch_request_brute_force.deployed_index_id = DEPLOYED_BRUTE_FORCE_INDEX_ID\n",
        "for query in queries:\n",
        "    batch_request_ann.requests.append(get_request(query, DEPLOYED_INDEX_ID))\n",
        "    batch_request_brute_force.requests.append(\n",
        "        get_request(query, DEPLOYED_BRUTE_FORCE_INDEX_ID)\n",
        "    )\n",
        "batch_request.requests.append(batch_request_ann)\n",
        "batch_request.requests.append(batch_request_brute_force)\n",
        "\n",
        "response = stub.BatchMatch(batch_request)\n",
        "response"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_mNwdU9_B_Ez"
      },
      "source": [
        "### Compute Recall\n",
        "\n",
        "Use deployed brute force Index as the ground truth to calculate the recall of ANN Index:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "L55vcqox5cQz"
      },
      "outputs": [],
      "source": [
        "def get_neighbors(embedding, deployed_index_id):\n",
        "    request = match_service_pb2.MatchRequest(num_neighbors=k)\n",
        "    request.deployed_index_id = deployed_index_id\n",
        "    for val in embedding:\n",
        "        request.float_val.append(val)\n",
        "    response = stub.Match(request)\n",
        "    return [int(n.id) for n in response.neighbor]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2vqQxfD9ufJm"
      },
      "outputs": [],
      "source": [
        "# This will take 5-10 min\n",
        "\n",
        "recall = sum(\n",
        "    [\n",
        "        len(\n",
        "            set(get_neighbors(test[i], DEPLOYED_BRUTE_FORCE_INDEX_ID)).intersection(\n",
        "                set(get_neighbors(test[i], DEPLOYED_INDEX_ID))\n",
        "            )\n",
        "        )\n",
        "        for i in range(len(test))\n",
        "    ]\n",
        ") / (1.0 * len(test) * k)\n",
        "\n",
        "print(\"Recall: {}\".format(recall))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TpV-iwP9qw9c"
      },
      "source": [
        "## Cleaning up\n",
        "\n",
        "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
        "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
        "You can also manually delete resources that you created by running the following code."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sx_vKniMq9ZX"
      },
      "outputs": [],
      "source": [
        "index_client.delete_index(name=INDEX_RESOURCE_NAME)\n",
        "index_client.delete_index(name=INDEX_BRUTE_FORCE_RESOURCE_NAME)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "omj7N9iWv-Tq"
      },
      "outputs": [],
      "source": [
        "index_endpoint_client.delete_index_endpoint(name=INDEX_ENDPOINT_NAME)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "matching_engine_for_indexing.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
